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■This  paper  considers  a linear  regression  problem  involving 
economic  data  used  by  Longley'"  [sl^in  a stucjy  of  the  performance  of 
regression  programs.  The  data  set  is  notoriously  difficult  to  handle 
computationally.  In  this  paper,  the  singular  value  decomposition  and 
the  QR  factorization  are  used  to  show  that  very  small  perturbations  in 
the  data  render  it  col  inear,  thus  accounting  for  the  computational  dif- 
ficulties. Another  analysis,  based  on  coefficients  that  bound  pertur- 
bations in  the  regression  coefficients  in  terms  of  perturbations  in  the 
columns  of  the  data,  also  shows  the  extreme  sensitivity  of  the  problem. 

An  analysis  is  also  given  of  a perturbation  index,  introduced  by  Beaton, 
Rubin,  and  Barone  [ll^to  measure  the  sensitivity  of  regression  problems. 
It  Is  shown  that  the  index  is  valid  only  for  extremely  large  sample  sizes 
and  is  not  applicable  to  the  Longley  data  set.  pv 
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Another  Look  at  the  Longlev 


Data  Set 


G.  W.  Stewart 


1.  Introduction  | | 

In  a study  of  programs  for  solving  regression  problems,  Longley 
[5]  introduced  a set  of  economic  data  on  which  several  programs  failed 
to  compute  acceptably  accurate  regression  coefficients.  Recently  Beaton, 
Rubin,  and  Barone  [l,  hereafter  referred  to  as  BRBl  showed  that  the  re- 
gression coefficients  are  more  affected  by  errors  in  the  data  itself 
than  by  rounding  errors  due  to  any  reasonable  computational  scheme.  In 
summary,  they  made  the  conservative  assumption  that  the  data  was  accurate 
in  all  reported  figures  and  introduced  pseudo-random  perturbations  uni- 
formly distributed  between  -5  and  4.999  in  the  first  unreported  digit, 
so  that  the  perturbed  data  rounded  back  to  the  original  at  the  assumed 
number  of  significant  digits.  One  thousand  such  data  sets  were  generated, 
and  regression  coefficients  were  computed  for  each  set,  care  being  taken 
that  rounding  errors  in  the  computation  had  negligible  effects. 

Each  regression  coefficient  was  found  to  vary  in  both  sign  and 
magnitude  over  the  perturbed  data  sets.  Moreover,  the  medians  of  the 
coefficients  were  not  near  the  corresponding  coefficients  of  the  original 
problem,  in  spite  of  the  fact  that  the  perturbations  in  the  data  were 
symmetric.  To  explain  this  phenomena,  a limiting  solution,  valid  for 
large  numbers  of  observations,  was  derived,  along  with  a "perturbation 
Index",  which  perportedly  measures  the  sensitivity  of  the  regression 
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coefficients  to  errors  in  the  data.  Beaton,  Rubin,  and  Barone  conclude 
that  "the  use  of  stable  algorithms  and  high  precision  is  not  likely  to 
yield  a valid  answer  without  more  accurate  data”  and  that  the  perturbation 
Index  should  "be  used  routinely  to  indicate  the  exi stance  of  severe  insta- 
bilities in  regression  solutions." 

The  author  agrees  wholeheartedly  with  the  first  of  these  conclu- 
sions — especially  with  the  implication  that  what  seem  to  be  numerical 
problems  may  instead  be  symptoms  of  more  fundamental  statistical  diffi- 
culties. However,  there  are  easier  ways  to  see  that  the  Longley  data 
set  is  a hard  case  than  performing  a large  simulation  experiment.  One 
of  the  purposes  of  this  paper  is  to  present  three  ways,  two  of  which 
provide  a plausible  explanation  for  the  behavior  of  the  medians  of  the 
regression  coefficients.  Specifically,  we  shall  show  that  there  are 
data  sets  with  exact  colinearities  within  the  domain  of  perturbations 
considered  by  BRB.  Moreover,  we  shall  give  reasons  for  believing  that 
the  perturbations  introduced  by  BRB  actually  tend  to  make  the  problem 

I 

better  behaved,  this  bias  accounting  for  the  bias  in  the  coefficients. 

The  third  approach  is  to  compute  numbers  that  measure  how  sensi- 
tive the  individual  regression  coefficients  are  to  perturbations  in  the 
individual  variables  of  the  data  set.  These  sensitivity  coefficients 
immediately  show  that  no  accuracy  can  be  expected  in  the  regression  coef- 
ficients in  the  presence  of  perturbations  of  the  size  considered  by  BRB. 


The  results  of  the  sensitivity  analysis  are  at  variance  with  what 
the  perturbation  index  implies  about  the  coefficients.  Accordingly,  a 
section  of  this  paper  is  devoted  to  an  analysis  of  the  asymptotic  properties 
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of  the  perturbation  index,  in  which  it  is  shown  that  it  is  a valid 
measure  of  sensitivity  only  when  the  number  of  observations  Is  very 
large. 

It  will  sometimes  be  convenient  to  cast  the  results  of  this 
paper  in  the  language  of  norms.  We  shall  use  the  vector  2-norm  defined 
for  any  vector  x by 

li  x H = (Ex?)1/2. 

For  any  matrix  X we  shall  use  either  the  Frobenius  norm  defined  by 

2 ^2 

1IXI|F  = (zi,jxij) 


or  the  spectral  norm  defined  by 

IIX1I2  = sup  NX  xll  . 

Kxll=l 

The  appearence  of  ||X1|  without  a subscript  in  any  statement  means  that 
the  statement  holds  for  either  the  Frobenius  or  the  spectral  norms.  For 
a review  of  the  properties  of  these  norms  see  t6l  . 

I would  like  to  thank  Kathy  Schmidt  for  her  programing  and  com- 
putational help  and  David  Hoaglln  for  his  comments  on  a preliminary  version 
of  this  paper. 
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where  1 = (1,1,...,1)T  and  X is  a 16x6  matrix.  The  columns  Xj xg 

of  X contain  observations  of  six  independent  variables,  and  the  vector 
y contains  observations  of  the  dependent  variable.  Table  1 contains 
these  observations*,  along  with  the  regression  coefficients  8Q,Bj,...,6g 
(for  further  derived  data,  such  as  means,  correlations,  etc.,  see  [l]  ). 

We  shall  follow  BRB  in  regarding  this  data  as  fixed  and  consider- 
ing the  effects  of  perturbations  on  the  regression  coefficients.  Unless 
otherwise  stated,  the  perturbations  will  be  restricted  to  the  interval 
[ -.5,  .5]  so  that  any  perturbed  data  set  rounds  back  to  the  original. 
This  restriction  is  very  conservative,  since  it  is  unlikely  that  any  of 
the  variables  Xj.Xg.-.-.Xg  are  known  to  more  than  three  figures. 

We  shall  have  occasion  to  work  with  the  adjusted  matrix  X,  ob- 

a 

tained  from  X by  subtracting  column  means;  i.e. 

X = X - lmT  , 

a 


where 


Since  the  adjustment  of  X is  by  an  additive  factor,  a perturbation  in 
X corresponds  to  an  Identical  perturbation  In  X . However,  If  we  per- 

a 

turb  X to  get  X,  and  form  X * X.  + lmT,  the  resulting  X adjusts 

d a a 

back  to  L if  and  only  if 

a 


* For  the  variable  Xj  we  have  reported  the  original  data  times  ten, 

SO  that  the  perturbations  defined  below  will  have  uniform  ranges  and 
variances. 
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(2.1)  lTXa  = 0. 

Thus  if  we  wish  to  induce  perturbations  in  X by  perturbing  X.  we 

d 

must  take  care  that  (2.1)  is  satisfied.  This  point  will  prove  important 
In  the  next  two  sections. 

3.  Singular  value  analysis 

It  is  well  known  (e.g.  see  [6]  ) that  for  any  nxp  matrix  X 
with  n>p,  there  are  orthogonal  matrices  U and  V such  that 


where 


and 


M = diagfpj.ug tJp) 


vl  - v2  ^p  * °* 

The  decomposition  (3.1)  is  called  the  singular  value  decomposition  of 
X.  The  numbers  Uj.Ug** • • »np  are  singular  values  of  X and  the 
columns  of  U and  V are  respectively  the  left  and  right  singular  vec- 
tors of  X.  If  U is  partitioned  in  the  form 

u = (uru2) 

where  Uj  is  nxp  then 


i 


(3.2) 


X =■  U1MVT, 
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an  expression  which  is  sometimes  called  the  singular  value  factorization 
of  X. 

The  singular  value  decomposition  has  an  important  approximation 
property.  Given  the  integer  k<p,  let 

M - diag(^ ,. . . ,1^,0,. . . ,0) , 

and  let 

(3.3)  X = UjMV1. 

Then  X has  rank  less  than  or  equal  to  k,  and  for  any  nxp  matrix  Y 
of  rank  less  than  or  equal  to  k 

II X - XII2  < !|X  - Yll2. 

Thus  X is  a matrix  of  rank  less  than  or  equal  to  k that  is  nearest 
to  X in  the  least  squares  sense,  and  X is  easily  computable  from  the 
singular  value  factorization  of  X. 

This  method  of  obtaining  nearby  matrices  with  colinearities  may 
be  applied  to  the  Longley  data  set.  If  we  compute  the  singular  value 
decomposition  of  the  matrix  X.  for  the  Longley  data  set  (the  UNPACK 

a 

code  SSVDC  was  used  1.3]  ),  we  get  the  following  sequence  of  singular 
values,  rounded  to  two  places: 

(3.4)  3.9- 105,  4.7- 103,  1.7- 103,  1.3- 103,  3.7-101,  6.7-Kf1. 

The  smallest  singular  value  is  near  the  error  range  described  in  §2. 
Accordingly,  we  set  it  to  zero  and  compute  Xa  in  analogy  with  (3.3)  as 
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(3.5)  Xa  = l^MV1 

and  then  form 

X = X + JmT. 
a 

In  order  for  this  process  to  be  legitimate,  the  condition  (2.1) 
must  be  satisfied,  so  that  the  adjusted  X is  the  rank  defficient  matrix 
X . Since  M is  nonsingular,  it  follows  from  (3.2),  with  X replaced  by 

a 

X that  . 

a U,  = X VM"1 

X a 

Since  lTXfl  = 0,  it  follows  that  17U^  = 0 and 

lTXa  = fUjHV7-  0, 


which  Is  just  the  condition  (2.1). 

The  first  and  sixth  columns  of  X are  reproduced  to  eight  figures 
in  Table  2 (the  deviations  of  the  other  columns  were  below  the  level  of 
rounding  error).  The  largest  deviation  from  X occurs  in  the  year  1951 
and  has  a value  of  0.4196.  Thus  the  perturbations  are  well  within  the 
range  described  In  Section  2.  It  follows  that,  for  all  one  knows,  the 
"true"  values  of  the  Longley  data  set  could  harbor  an  exact  colinearity. 
In  particular,  within  the  domain  of  matrices  treated  by  BRB,  there  are 
points  where  the  regression  coefficients  fail  to  exist,  and  near  these 
points  the  coefficients  can  become  arbitrarily  large.  Under  the  circum- 


2.  Rank  Deficient  Approximations  to  X 
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To  see  how  this  may  happen,  we  must  look  at  the  effects  of  perturbations 
in  a matrix  X on  its  smallest  singular  value.  Let  X = X + E,  where 
we  assume  that  the  elements  of  E are  uncorrelated  with  mean  zero  and 

p 

common  variance  a . From  (3.2)  it  is  easy  to  see  that  the  eigenvalues 
T 2 2 2 

of  X X are  pj,u2,. .. .up  with  corresponding  eigenvectors  Vj,v2>. . . ,vpl 

~2 

where  v.  is  the  j-th  column  of  V.  It  follows  that  the  square  u of 

J r 

the  smallest  singular  value  of  X is  the  smallest  eigenvalue  of 


(X  + E)T(X  + E)  = XTX  + XTE  + ETX  + ETE. 


"2  T T 

The  first  order  approximation  to  up  is  given  by  vp(X  + E)  (X  + E)vp 
(e.g.  see  L6]  ).  If  we  use  the  facts  that  Xv„  = y u and  XTu„  = u v , 

j ppp  ppp 


vl  ■ vIxTXv  + vVev„  + v]*ETXv  + vjETUUTEvn 

pppppppp  p 


(3.6) 


= + 2y  U^Ev  + E (uTev  )(uTev  ) 

Mp  Mp  p p i=l  I pi  p 


= u2  + 2u  u^E v + (u'evJ2  + (uTev  )2 
P PPP  P P i = l 1 P 


where 


= S * upEvP)Z  * ,2> 


<2  ■ 


From  the  distributional  assumptions  on  E and  the  orthonormality 


of  the  u-  and  the  v.,  it  follows  that 

J 


mm 
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(3.7)  E(t2)  * (n-l)a2. 

~2 

Thus  (3.6)  partitions  the  first  order  approximation  to  yp  into  two 
terms,  one  the  square  of  a term  deviating  from  yp  by  a quantity  with 

o 

standard  deviation  a and  the  other  a sum  of  squares  with  mean  (n-l)o  . 

As  long  as  yp  is  sufficiently  larger  than  a,  the  fluctuations  in  yp 

are  almost  entirely  due  to  the  first  term.  But  as  yp  approaches  o, 

the  second  term  will  dominate,  and  tend  to  increase  the  value  of  yp. 

To  summarize  this  informal  argument:  if  the  elements  of  a matrix  are 

perturbed  b^  quantities  nearly  equal  to  its  smallest  singular  value,  the 

perturbations  will  tend  to  increase  that  singular  value. 

2 

In  the  case  of  the  BRB  experiments,  we  have  a = 1/12  and 

2 - 

= .45.  From  (3.7)  it  follows  that 
E(t2)  = 1.25. 

2 

Thus  the  t term  dominates,  and  the  effect  of  the  perturbations  is  for 
the  most  part  to  produce  a better  behaved  problem  with  ug  increased. 

We  believe  that  this  bias  toward  nicer  problems  is  the  cause  of  the  bias 
in  the  perturbed  coefficients  observed  by  BRB. 

In  the  foregoing  we  have  taken  care  to  scale  the  columns  of  X 
so  that  the  presumed  uncertainties  In  the  data  are  all  equal.  This  has 
the  effect  of  making  the  singular  values  readily  interpretable  in  terms 
of  the  matrix  X;  the  suppression  of  a singular  value  less  than  the  un- 
certainly will  cause  the  elements  of  X to  be  perturbed  by  quantities 
of  the  same  magnitude.  On  the  other  hand,  if  the  variables  were  scaled 
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so  that  the  uncertainties  were  disparate,  the  suppression  of  a small 
singular  value  could  overwhelm  a still  smaller  uncertainty  in  a partic- 
ular column.  We  mention  this  point  because  it  is  a common  practice  to 
scale  X so  that  its  columns  have  norm  one  (in  which  case  the  columns 

u 

of  V are  the  principal  components  of  the  problem).  Whatever  the  merits 
of  this  approach  in  other  circumstances,  it  is  clearly  not  the  thing  to 
do  here. 


4.  Analysis  via  the  QR  decomposition 

A comparison  of  Table  2 with  Table  1 shows  that  the  perturbations 
introduced  by  the  singular  value  analysis  occur  mostly  in  the  time  vari- 
able Xg.  This  suggests  the  possibility  of  obtaining  a singular  pertur- 
bation of  X by  changing  only  the  sixth  column.  In  this  section  we  shall 
show  how  the  QR  decomposition  may  be  used  to  find  such  a perturbation. 

Given  any  nxp  matrix  X with  n>p,  there  is  an  orthogonal  matrix 
Q such  that 


where  R is  upper  triangular  (e.g.  see  [6]  ).  This  decomposition  of 
X is  called  the  QR  decomposition.  If  we  write  Q = (Qj,Q2),  where  Qj 
is  nxp,  then  it  follows  from  (4.1)  that 


(4.2) 


X = QjR, 


and  (4.2)  is  called  the  QR  factorization  of  X. 
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4 


1 The  QR  decomposition  is  a useful  computational  and  theoretical 

] 

tool  in  linear  regression;  however,  for  our  purposes  we  need  only  the 
following  approximation  theorem,  which  appears  to  be  new. 

Theorem  3.1.  In  the  QR  decomposition  (4.1),  suppose  that  R is 
nonsingular.  Let  R be  obtained  from  R by  setting  rpp  = 0,  and  let 

(4.3)  X = QjR. 

Then  X differs  from  X only  in  its  p-th  column,  and  rank(X)  = p-1. 
Moreover,  if  Y is  an  nxp  matrix  that  differs  from  X only  in  its 
p-th  column  and  satisfies  rank(Y)  s p-1,  then 

(4.4)  II X - XII  < || X - Yll  . 

; ■ 

Proof.  By  construction  R and  R differ  only  in  their  (p,p)  - 
elements.  Hence,  X and  X differ  only  in  their  p-th  columns.  Moreover, 


R is  of  rank  p-1,  and  therefore  so  is  X. 

To  establish  (4.4),  let  R be  partitioned  in  the  form 


- 12  - 


where  is  a (p-l)-vector  and  z2  is  a scalar.  Then 

R'  r 
QTX  = 0 0 

1 0 0, 

and 

R'  ,, 

0 z2 

° **/ 

It  follows  that 

(4.5)  || X - X||2  = r2p  . 

Now  for  Y to  have  rank  p-1,  the  quantities  z2  and  z3  must  be  zero. 
Hence 

(4.6)  l|  X - Yll2  = || r - Zjll2  + r2p  . 

The  inequality  (4.4)  follows  from  (4.5)  and  (4.6). 

The  application  of  this  theorem  to  the  Longley  data  set  is  similar 
to  the  singular  value  analysis.  The  QR  decomposition  of  the  matrix  X 

d 

was  computed  by  the  UNPACK  routine  SQRDC  1.31  . The  element 

r66  = 0.6693051 

of  R was  set  to  zero  to  give  R and  Xfl  computed  in  analogy  with  (4.3) 

as 


Xa  = QiR  . 

r 
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An  argument  similar  to  the  one  in  the  last  section  establishes  that 
1TQj  = 0.  Hence  Xa  satisfies  (2.1),  and  we  may  add  means  as  usual  to 
get  X.  The  sixth  column  of  X,  which  is  the  only  one  that  has  been  al- 
tered, has  been  appended  to  Table  2.  The  largest  deviate  corresponds 
to  the  year  1951  and  has  a value  of  1951.420,  so  that  X again  lies 
within  the  range  of  perturbations  considered  by  BRB. 

We  observed  in  connection  with  the  singular  value  decomposition 
that  perturbations  could  tend  to  make  a problem  better  behaved.  Much 
the  same  thing  can  occur  with  errors  Introduced  into  a single  column. 
Specifically,  let  X have  the  QR  decomposition  (4.1)  and  let  X be 

obtained  from  X by  adding  to  Xp  a vector  e whose  elements  are  un- 

2 T 

correlated  with  mean  zero  and  common  variance  o . Let  f = Q e.  Then 
if  we  partition  f in  the  form 


where  fj  is  a (p-l)-vector  and  f2  is  a scalar,  we  have  in  the  nota- 


tion used  above. 


R'  r + f. 


QTX  = 


rn  +f,  . 

PP  2 


It  follows  that  the  (p,p)-element  of  R satisfies 


(rpp  * f2)Z  + f3  2 ’ 
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p 


and 


E(l|f3ll2)  = (n-p)o2  . 

? 

If  rpp  is  near  a in  magnitude,  the  term  l|f3ll  will  tend  to  dominate 

and  increase  r . 

PP 

For  perturbations  in  the  variable  xg  of  the  Longley  data  set  we 

have 

(n-p)o2  = ||  = .83 
2 

which  clearly  dominates  rpp  = .45.  Although  this  analysis  does  not 

strictly  apply  to  the  perturbations  considered  by  BRB,  since  it  assumes 

the  other  variables  are  not  perturbed,  it  none  the  less  gives  a fair 

indication  of  what  is  going  on.  In  ten  simulations,  done  by  the  author 

for  other  purposes,  it  was  observed  that  the  average  value  of  r was 

~2 

1.2,  which  is  in  fair  agreement  with  expectation  1.37  of  r In  (4.7). 

rr 

For  this  data  set,  the  QR  decomposition  yields  much  the  same  results 
as  the  singular  value  decomposition.  However,  this  is  in  part  due  to 
the  fortuitous  ordering  of  the  variable  xg;  another  ordering  of  the 
columns  could  give  different  results.  In  general,  it  may  be  necessary 
to  Inspect  rpp  for  different  orderings.  There  Is  no  need  to  examine 
all  2P_1  orderings,  since  the  value  of  rpp  depends  only  on  the  vari- 
able that  Is  placed  last  and  not  on  the  ordering  of  the  other  variables. 
Efficient  algorithms  exist  to  determine  these  p different  values  of 
rpp  after  R has  been  computed  once  for  a specific  ordering  [3,  Ch.io]  . 


5.  Sensitivity  coefficients 

The  results  of  the  last  two  sections  suggest  that  the  regression 
coefficients  for  the  Longley  data  set  will  be  extremely  sensitive  to 
perturbations  in  the  variable  xg  and,  to  a lesser  extent,  in  the  vari- 
able Xj.  For  sufficiently  small  perturbations,  we  can  make  this  pre- 
cise by  computing  linear  approximations  to  the  perturbations  in  the 
regression  coefficients.  In  this  section  we  shall  sumnarize  the  results 
of  such  an  approach.  The  reader  will  find  details  in  |_4l  or  [8]  . 

In  a general  regression  model  with  regression  matrix  X,  assume 
that  S is  of  full  rank  so  that  the  vector  of  least  squares  coefficients 
is  given  by 

(5.1)  8 = (XTX)-1XTy  = CXTy  i X+y  , 

where  for  later  use  we  have  set 

C = (xV1 

and 

x+  = (xTx)-¥ 

( X+  is  the  pseudo-inverse  of  X ).  From  (5.1)  It  is  evident  that  if 
X Is  restricted  to  a sufficiently  small  neighborhood  of  X,  then 
6 ■ X*y  is  a differentiable  function  of  X.  In  particular,  if  we  write 
as  a function  of  the  j-th  column  of  X,  say 
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then  6^  can  be  expressed  in  the  form 

(5-2)  ^ ■ B1  + - Xj)  + 0(|| Xj  - Xjll2)  , 

where  the  row  vector  fMx.)  is  the  gradient  of  f . . evaluated  at 

* J J 1 J 

x..  It  turns  out  that  there  is  an  easy  expression  for  f!.(x.)  and  its 
J TJ  J 

norm  y^.. 

Theorem  4.1  4,8  . Let  xj+^  denote  the  i-th  row  of  X+,  and 

let 

r = y - Xe  . 

Then 

W ■ -Vit>+  c«rT 

and 

Y?j  E = ejcn  + j • 

There  are  two  ways  in  which  this  theorem  can  be  applied.  In  the 
first  place,  it  follows  from  (5.2)  that 

i ®i  • B('  * « *j  - xj"  * °t*iJ  - */>  • 

Thus  if  we  can  place  a bound  on  the  size  of  the  perturbation  x.  - x. 

J J 

In  Xl  and  the  perturbation  is  sufficiently  small*,  then  y. . ||  x . - x.|| 

*Th1s  will  be  true  IfllJfMlUXj-XjH  1*  significantly  less  than  one, 
say  less  than  0.2,  a result  which  can  be  derived  from  theorems  In  [7j  . 


I 


1 
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estimates  the  perturbation  in  b^  due  to  the  perturbation  In  Xj. 

For  this  reason  we  shall  call  a sensitivity  coefficient. 

A second  approach  is  to  make  distributional  assumptions  about 
the  components  of  x,  - x.,  say  that  they  are  independently  distributed 

J J 

o 

with  means  zero  and  common  variances  a . Then  the  variance  of  the  ap- 
proximation 

(5.2)  - B1  + - »J> 

2 2 

is  Yija  ’ so  that  a9a*n  Yij  estimates  the  variability  of  6,  due  to 
perturbations  in  x..  However,  some  care  is  required  here.  If  the  dis- 

J 

trlbutlon  of  Xj  - x^  is  continuous  and  nonzero  at  a singularity  of  X, 
then  we  cannot  guarantee  that  the  moments  of  B.  exist.  This  will  always 

J 

be  the  case  if  Xj  - Xj  is  normally  distributed.  Intuition  suggests 
2 

that  if  o is  small  enough  then  B]  will  accurately  approximate 

2 2 

except  in  a region  of  low  probability,  so  that  yj\o  will  adequately 
describe  the  variability  of  ; however,  this  area  needs  further  study 

The  sensitivity  coefficients  can  easily  be  computed  from  quantit.es 
normally  generated  in  the  course  of  solving  regression  problems.  We  have 
done  this  for  the  matrix  X.  obtained  from  the  Longley  data  set  and  the 

d 

adjusted  vector  ya  ■ y - ( lTy ) 1/16.  Since  the  regression  coefficients 
differ  widely  in  magnitude,  we  report  y^/  6^  in  Table  3.  These 
scaled  coefficients  measure  the  sensitivity  of  the  relative  error 
- B^  /( B1l  ; if  this  error  if  less  than  10’s  then  B..  and  B^ 
agree  to  about  s significant  figures. 
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The  coefficients  confirm  our  conclusions  about  the  sensitivity 
of  the  problem  to  perturbations  in  and  Xg.  For  example,  the 
coefficient  Yjg/Bj  is  34.  If  we  consider  perturbations  introduced  by 
rounding  the  elements  of  xg  in  the  s-th  place  beyond  the  decimal,  then 
the  maximum  such  perturbation  is  ±5-10"s.  If  follows  that 
U Xg  - Xg||  s 20'10"s;  hence 

I h " 6il  -s 

* 34-20- 10  s 

I Pjl 

To  be  sure  of  one  figure  of  accuracy  in  6j,  we  must  have  34-20- 10”s  < .1 

-4 

or  s i 4.  The  corresponding  perturbation  of  ±5-10  years  amounts  to 
about  ±4.4  hours.  Although  this  is  a worst  case  analysis,  it  reflects 
the  extreme  sensitivity  of  Bj  to  perturbations  in  Xg;  a probabistic 
analysis  would  give  only  slightly  less  dramatic  results.  The  sensitivity 
coefficients  also  show  that  Bj  and  Bg  are  quite  sensitive  to  pertur- 
bations in  Xj. 

We  must  insert  a word  of  caution  here.  The  results  of  the  last 
three  sections  all  agree  in  condemning  the  variable  Xg  as  a trouble 
maker,  and  to  a lesser  extent  the  variable  Xj.  It  is  tempting  to  con- 
clude that  all  will  be  well  if  we  exclude  xg  and  Xj  from  the  model. 
However,  the  sensitivity  of  the  coefficients  to  x^  and  Xg  is  a func- 
tion of  the  entire  model.  There  Is  no  reason  to  expect  that  either  xg 
or  x.  cannot  behave  themselves  in  a reduced  model.  The  techniques  we 

X 

have  described  In  this  paper  are  designed  to  detect  trouble,  not  to  rem- 

Iedy  It,  and  we  discourage  their  naive  application  to  the  variable  selec- 
tion problem. 


r 
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6.  Limitations  of  a perturbation  index. 

The  first  order  perturbation  theory  of  the  last  section  is  sharp 
in  proportion  as  the  variance  is  small.  A different  approach  would  fix 
the  variance  of  the  errors  and  investigate  what  happens  as  n becomes 
large.  This  case  has  been  analyzed  in  Ll)  and  [23  . In  this  section 
we  shall  be  concerned  with  how  large  n must  be  for  the  analyses  to  be 
applicable. 

The  basic  results  are  derived  as  follows.  We  begin  with  a sequence 
of  regression  problems  with  full  rank  nxp  matrices  Xp  (n  = 1,2,...) 
and  observation  vectors  yn  (n  = 1,2,... p).  The  coefficient  vectors  bn 
are  given  by 

b = (xJxJ^xV  . 

n n n n^n 

We  suppose  further  that  there  is  a limit  problem  in  the  sense  that  there 
Is  a positive  definite  matrix  A,  a p-vector  c,  and  a scalar  n such 
that 

lim  n-1x][xn  = A , 
n->-  n n 

(6.1)  , 

lim  n 

n->® 

and 

(6.2)  lim  n"1  yn  2 * n2- 

n->°° 

It  follows  that 

lim  b„  - A'lc  i b. 


- 
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Now  suppose  that  we  are  actually  given  the  matrices 


X - X+  En, 
n n n 


where  the  elements  of  En  are  assumed  to  be  uncorrelated  with  mean  zero 

2 

and  common  variance  a . The  coefficient  vector  obtained  by  working  with 

X„  instead  of  X„  is  given  by 
n n 


bn  ‘ 


(6.3) 


■ + XIEn  * * EnEn>l_1  * Ek> 

The  limits  in  probability  of  the  terms  in  the  right  hand  side  of 
(6.3)  can  easily  be  evaluated.  From  the  assumptions  on  En  we  have 
Immediately  that 

Pi  1m  "_1(eIe J = a2 1. 

n->» 

Next,  from  (6.1)  it  follows  that  if  x|n^  denotes  the  i-th  column  of 


X„,  then 
n 


(6.4) 


11m  MX1  >1 
n->®  n 


= air 


-1/2 

Hence  n ' X„  is  bounded  and 
n 

xte 

pl  im  Vn  . Q 
n->°°  n 


r 


I 


- 21  - 


Finally,  from  (6.2)  it  follows  that  n“1//2yn  is  bounded  and 

plim  _ q 


n->“  n 


Hence 


plim  b = (A  + a2I)_1c. 
n->« 

Equation  (6.5)  shows  clearly  that  plim  bn  differs  from  the 
true  solution  b by  quantities  that  depend  on  the  variance  of  E.  We 
may  obtain  specific  bounds  for  this  difference  by  applying  results  from 
standard  matrix  perturbation  theory  (e.g.  see  \6]  ).  Specifically,  if 


(6.6) 


o2il  A"1  II  < 1 


then  (A  + cj  I)  is  nonsingular  and 

(6.7) 


II b - plim  bj  g 

pi  - ; 


2I)  A-1 1| 


1 - o II A 

Since  trace(A’*)  >IIA-1I|  , we  may  replace  the  condition  (6.6)  by 

p 

o trace(A)  < 1 
and  the  bound  (6.7)  by 


(6.8) 


|| b - plim  bn| 

— m — 


p^racetA"1) 

1 - a trace(A"*) 
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V 


t 


The  right  hand  side  of  (6.7)  or  (6.8)  is  a relative  error  in 

the  vector  e.  If  it  is  of  order  10"s,  then  the  largest  components  of 

pi im  6n  will  be  in  error  in  about  their  s-th  digit.  The  bounds  may 

then  be  interpretated  as  saying  that  if  either  a ||A"AI|  or 
2-1 

a trace(A  ) is  near  one,  the  plim  of  bn  may  differ  entirely  from  b. 

2-1 

For  this  reason,  BRB  call  o trace(A  ) the  perturbation  index  for  the 
problem  and  recommend  that  it  be  monitored  to  determine  the  sensitivity 
of  the  problem  to  errors  in  the  variables. 

To  compute  the  perturbation  index  for  the  Longley  data  set,  we 
approximate  A ■ X^X  /16.  Now 

a a 


trace(X^Xg)  ^ ^ ♦ — iy-  +•  • .+  , 


where  the  are  the  singular  values  displayed  in  (3.4).  Thus 

trace(A"*)  = 36.8 


2 

For  uniform  errors  in  the  first  unreported  figure,  we  have  a =1/12, 
so  that  the  perturbation  index  is  about  three,  which  gives  ample  warning 
of  trouble. 

However,  if  we  consider  errors  in  the  second  unreported  figure, 

p 

we  have  o = 1/1200,  and  hence  the  perturbation  index  is  about  0.03,  a 
value  which  promises  reasonable  accuracy  in  plim  (5n.  On  the  other  hand 
the  sensitivity  coefficients  suggest  that  the  relative  error  in  the  co- 
efficient 6j  due  to  perturbations  of  this  kind  in  xg  will  be  of  order 


r 


i 

l 
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of  magnitude 


. = 34  .A 


O'-'  w ' • 

P1  '0200 

Thus  we  can  expect  no  accuracy  in  in  spite  of  the  small  perturbation 
index. 

The  cause  of  the  difficulty  is  that  n must  be  very  large  for 

&n  to  approximate  its  plim  with  any  degree  of  certainty.  Returning 

to  (6.3),  we  see  that  replacing  the  matrix  n_1(xlE„  + e][x  ) by  its  plim 

n n n n 

of  zero  can  only  be  justified  if  it  is  small  in  probability  compared  with 
the  plim  of  n"*E^En.  In  particular,  the  variance  of  a diagonal  element 

of  n_1(xIEn  + EIXJ  ^ 

n n n n 


This  variance  must  be  small  compared  with  the  square  of  the  corresponding 

-IT  4 

diagonal  element  of  plim  n EnEn,  which  is  o . Hence  n must  at  least 
satisfy 


(6.9) 


n > 4 
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The  number  a/JaT.  is  a measure  of  the  relative  size  of  the 
perturbations  in  the  i-th  column  (if  the  data  has  been  adjusted,  /aT. 
is  approximately  the  standard  deviation  of  the  elements  of  the  i-th 
column).  For  example,  if  the  data  is  accurate  to  three  figures,  then 
a/ \ a . = 10  and  from  (6.9)  it  follows  that  n must  be  at  least 
four  million  before  the  analysis  leading  to  the  perturbation  index  is  to 
be  trusted.  If  we  are  concerned  with  rounding  errors  on,  say,  a computer 
carrying  eight  decimal  digits,  then  a/>]  a^  = 10  and  n must  be  in 
the  quadrillions. 

As  far  as  the  Longley  data  is  concerned,  the  largest  standard 

5 2 

deviation  occurs  for  the  variable  x ^ and  is  about  10  . Taking  a =1/12, 
we  must  have 

n > 4 * 12 * 107 * * 10  = 4.8-1011, 

a criterion  which  the  sixteen  observations  in  the  Longley  data  set  fall 
short  of  satisfying. 

7.  Conclusions. 

Although  we  have  confined  our  attention  to  the  Longley  data  set 

In  this  paper,  the  techniques  that  we  have  used  are  quite  general.  If 
one  can  estimate  the  sizes  of  the  errors  In  the  variables,  then  the  singu- 

lar value  decomposition  provides  a way  of  seeing  if  they  can  have  dlsas- 

terous  effects  (we  again  stress  the  need  for  proper  scaling  of  X).  The 
QR  decomposition  allows  one  to  search  for  particularly  offensive  columns. 


1 
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Perhaps  most  useful  of  all  are  the  sensitivity  coefficients.  Being 
derived  from  a linearization  of  the  problem,  they  are  not  valid  for 
large  errors;  however,  if  a problem  is  locally  sensitive,  then  large 
errors  are  unlikely  to  correct  the  difficulty.  We  add  that  efficient 
software  for  implementing  these  techniques  exists,  and  that,  properly 
done,  none  of  them  will  cause  an  order  of  magnitude  change  in  the  costs 
of  computation. 

As  regards  the  perturbation  index,  we  recommend  that  its  use  be 
eschewed.  Although  a perturbation  index  near  to  or  greater  than  one  is 
certainly  a sign  of  trouble,  it  can  be  misleadingly  small.  Moreover,  it 
measures  effects  that,  in  most  practical  circumstances,  can  be  seen  only 
when  the  sample  size  is  astronomically  large. 
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